Automatic detection of plagiarized spoken responses
نویسندگان
چکیده
This paper addresses the task of automatically detecting plagiarized responses in the context of a test of spoken English proficiency for non-native speakers. A corpus of spoken responses containing plagiarized content was collected from a high-stakes assessment of English proficiency for non-native speakers, and several text-to-text similarity metrics were implemented to compare these responses to a set of materials that were identified as likely sources for the plagiarized content. Finally, a classifier was trained using these similarity metrics to predict whether a given spoken response is plagiarized or not. The classifier was evaluated on a data set containing the responses with plagiarized content and non-plagiarized control responses and achieved accuracies of 92.0% using transcriptions and 87.1% using ASR output (with a baseline accuracy
منابع مشابه
Plagiarism Detection
detection procedure consisted of automatic scanning of manuscripts using plagiarism detection software (eTBLAST and CrossCheck) and manual verification of manuscripts suspected of having been plagiarized (more than 10% text similarity). They found that 11% of manuscripts were plagiarized, 8% were true plagiarism, and 3% were self-plagiarism. The Korean Journal of family medicine (KJFM) has also...
متن کاملIntrinsic Plagiarism Detection
Current research in the field of automatic plagiarism detection for text documents focuses on algorithms that compare plagiarized documents against potential original documents. Though these approaches perform well in identifying copied or even modified passages, they assume a closed world: a reference collection must be given against which a plagiarized document can be compared. This raises th...
متن کاملWho's the Thief? Automatic Detection of the Direction of Plagiarism
Determining the direction of plagiarism (who plagiarized whom in a given pair of documents) is one of the most interesting problems in the field of automatic plagiarism detection. We present here an approach using an extension of the method Encoplot, which won the 1st international competition on plagiarism detection in 2009. We have tested it on a large-scale corpus of artificial plagiarism, w...
متن کاملAN-EUL method for automatic interpretation of potential field data in unexploded ordnances (UXO) detection
We have applied an automatic interpretation method of potential data called AN-EUL in unexploded ordnance (UXO) prospective which is indeed a combination of the analytic signal and the Euler deconvolution approaches. The method can be applied for both magnetic and gravity data as well for gradient surveys based upon the concept of the structural index (SI) of a potential anomaly which is relate...
متن کاملOn the Concept of Correct Hits in Spoken Term Detection
In most Information Retrieval (IR) tasks the aim is to find human-comprehensible items of information in large archives. One such task is the spoken term detection (STD) one, where we look for userentered keywords in a large audio database. To evaluate the performance of a spoken term detection system we have to know the real occurrences of the keywords entered. Although there are standard auto...
متن کامل